March 15, 2017
These slides: http://www.databrew.cc/cism
Reason 1: It's free
Reason 2: It's "open source"
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 4: It's powerful
Reason 5: It's fun
Download R: https://www.r-project.org/
Download RStudio: https://www.rstudio.com/products/rstudio/download/
Let's write some code!
2 + 2
Let's write some code!
2 + 2
[1] 4
Let's write some code!
x <- c(1,2,3,4,5)
Let's write some code!
x
[1] 1 2 3 4 5
Let's write some code!
barplot(x)
A "package" is simply a collection of code written by someone else.
It's what makes R powerful, but also confusing.
You only have to install a package one time.
install.packages('dplyr')
install.packages('devtools')
devtools::install_github('databrew/databrew')
devtools::install_github('joebrew/cism')
You have to use the library function every time you use a package.
library(databrew) library(cism) library(sp)
Writing library just means "I am going to use this package".
Since we've already written library(cism), now we can use some tools from the cism package.
plot(moz0)
plot(man3)
a <- 1 a + 3
a <- 1 a + 3
[1] 4
Let's create an object called "ages", with the age of everyone
ages <- c()
How do we view our ages object?
ages
How do we view our ages object?
ages
[1] 30 26 31 39 45 27 28 22 19 30 35
How do we view just the first element of our ages object?
ages[1]
How do we view just the first element of our ages object?
ages[1]
[1] 30
How do we sort our ages object?
sorted_ages <- sort(ages)
sorted_ages
[1] 19 22 26 27 28 30 30 31 35 39 45
How do we get the minimum, maximum, average age?
min(ages) max(ages) mean(ages)
min(ages)
[1] 19
max(ages)
[1] 45
mean(ages)
[1] 30.18182
How do we visualize our ages object?
hist(ages)
Previously, we looked at a one dimensional object: ages.
But most data is two dimensional: rows and columns.
This is called a data frame.
Let's play around with some real data.
Let's create a simple dataframe
www.databrew.cc/frangos.csv
frangos <- databrew::frangos
head(frangos)
# A tibble: 6 x 4 diet chick days grams <chr> <int> <dbl> <int> 1 corn 1 0.192 42 2 corn 1 1.01 51 3 corn 1 4.52 59 4 corn 1 6.72 64 5 corn 1 8.14 76 6 corn 1 9.11 93
Let's explore.
Brackets: []
Always save your scripts.
Never save your "workspace".
Work in "projects"
We're going to use the cism package to get weather data for the FQMA weather station (Maputo).
library(cism)
??get_weather
weather <- get_weather(station = 'FQMA',
start_year = 2010,
end_year = 2016)
Now that we have our weather data, we can look at it.
head(weather)
Now that we have our weather data, we can look at it.
head(weather)
NULL
# 1. How many rows are in our data? nrow(weather) # 2. How many columns? ncol(weather) # 3. What are the names of the columns? colnames(weather)
# 1. How many rows are in our data? nrow(weather)
NULL
# 2. How many columns? ncol(weather)
NULL
# 3. What are the names of the columns? colnames(weather)
NULL
# 4. What is the date range? range(weather$date) # 5. What is the maximum temperature? max(weather$temp_max) # 6. What is the minimum temperature? min(weather$temp_min) # 7. What is the average temperature? mean(weather$temp_mean)
# 4. What is the date range? range(weather$date)
[1] Inf -Inf
# 5. What is the maximum temperature? max(weather$temp_max, na.rm = TRUE)
[1] -Inf
# 6. What is the minimum temperature? min(weather$temp_min, na.rm = TRUE)
[1] Inf
# 7. What is the average temperature? mean(weather$temp_mean, na.rm = TRUE)
[1] NA
Which variables do we have which are numeric and continuous?
How can we visualize these?
Which variables do we have which are numeric and continuous?
temp_max, temp_mean, temp_min, etc…How can we visualize these?
boxplot(weather$temp_mean)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs): need finite 'ylim' values
hist(weather$temp_mean)
Error in hist.default(weather$temp_mean): 'x' must be numeric
Let's create a variable called "hot"
weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')
head(weather)
head(weather)
$hot logical(0)
table(weather$hot) hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table)
hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table) barplot(hot_table)
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'))
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'),
border = 'darkgrey')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
Let's create a plot of date (x-axis) and the maximum temperature
Let's create a plot of date (x-axis) and the maximum temperature
plot(weather$date,
weather$temp_max)
Error in plot.window(...): need finite 'xlim' values
Let's make our plot prettier
Let's make our plot prettier
plot(weather$date,
weather$temp_max,
type = 'l',
col = 'red',
xlab = 'Date',
ylab = 'Maximum temperature',
main = 'Maximim temperature in Maputo')
Error in plot.window(...): need finite 'xlim' values
We're going to use the cism package to get weather data for the FQMA weather station (Maputo).
library(cism)
??get_weather
weather <- get_weather(station = 'FQMA',
start_year = 2010,
end_year = 2016)
Now that we have our weather data, we can look at it.
head(weather)
Now that we have our weather data, we can look at it.
head(weather)
NULL
# 1. How many rows are in our data? nrow(weather) # 2. How many columns? ncol(weather) # 3. What are the names of the columns? colnames(weather)
# 1. How many rows are in our data? nrow(weather)
NULL
# 2. How many columns? ncol(weather)
NULL
# 3. What are the names of the columns? colnames(weather)
NULL
# 4. What is the date range? range(weather$date) # 5. What is the maximum temperature? max(weather$temp_max) # 6. What is the minimum temperature? min(weather$temp_min) # 7. What is the average temperature? mean(weather$temp_mean)
# 4. What is the date range? range(weather$date)
[1] Inf -Inf
# 5. What is the maximum temperature? max(weather$temp_max, na.rm = TRUE)
[1] -Inf
# 6. What is the minimum temperature? min(weather$temp_min, na.rm = TRUE)
[1] Inf
# 7. What is the average temperature? mean(weather$temp_mean, na.rm = TRUE)
[1] NA
Which variables do we have which are numeric and continuous?
How can we visualize these?
Which variables do we have which are numeric and continuous?
temp_max, temp_mean, temp_min, etc…How can we visualize these?
boxplot(weather$temp_mean)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs): need finite 'ylim' values
hist(weather$temp_mean)
Error in hist.default(weather$temp_mean): 'x' must be numeric
Let's create a variable called "hot"
weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')
head(weather)
head(weather)
$hot logical(0)
table(weather$hot) hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table)
hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table) barplot(hot_table)
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'))
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'),
border = 'darkgrey')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values
Let's create a plot of date (x-axis) and the maximum temperature
Let's create a plot of date (x-axis) and the maximum temperature
plot(weather$date,
weather$temp_max)
Error in plot.window(...): need finite 'xlim' values
Let's make our plot prettier
Let's make our plot prettier
plot(weather$date,
weather$temp_max,
type = 'l',
col = 'red',
xlab = 'Date',
ylab = 'Maximum temperature',
main = 'Maximim temperature in Maputo')
Error in plot.window(...): need finite 'xlim' values
We're going to analyze where Joe is, using data from google. The data is part of the databrew package.
# Load package library(databrew) # Get data joe <- joe
Let's have a look at the structure of our data.
head(joe)
date time longitude latitude velocity altitude 1 2017-03-13 2017-03-13 11:08:06 32.79699 -25.40760 NA NA 2 2017-03-13 2017-03-13 11:06:01 32.79699 -25.40760 NA NA 3 2017-03-13 2017-03-13 11:05:32 32.80439 -25.40608 NA NA 4 2017-03-13 2017-03-13 11:03:03 32.80439 -25.40608 NA NA 5 2017-03-13 2017-03-13 11:01:03 32.80545 -25.40844 NA NA 6 2017-03-13 2017-03-13 11:00:16 32.80545 -25.40779 NA NA heading accuracy 1 NA 2500 2 NA 2500 3 NA 1899 4 NA 1899 5 NA 400 6 NA 699
Let's filter our data so that it only contains observations for the period from March 7-13.
joe_filtered <- joe[joe$date >= '2017-03-07' &
joe$date <= '2017-03-13',]
Now let's use the cism package to plot Manhiça.
library(cism) library(sp) manhica <- man3 plot(manhica)
The databrew package has a nice function called visualize_location. Let's try it out
?visualize_location
visualize_location(x = joe_filtered,
spdf = manhica)
Let's also try with an interactive map
visualize_location(x = joe_filtered,
use_leaflet = TRUE)